Confidence intervals for ratio of means of delta-lognormal distributions based on left-censored data with application to rainfall data in Thailand

Thailand is a country that is prone to both floods and droughts, and these natural disasters have significant impacts on the country’s people, economy, and environment. Estimating rainfall is an important part of flood and drought prevention. Rainfall data typically contains both zero and positive observations, and the distribution of rainfall often follows the delta-lognormal distribution. However, it is important to note that rainfall data can be censored, meaning that some values may be missing or truncated. The interval estimator for the ratio of means will be useful when comparing the means of two samples. The purpose of this article was to compare the performance of several approaches for statistically analyzing left-censored data. The performance of the confidence intervals was evaluated using the coverage probability and average length, which were assessed through Monte Carlo simulation. The approaches examined included several variations of the generalized confidence interval, the Bayesian, the parametric bootstrap, and the method of variance estimates recovery approaches. For (ξ1, ξ2) = (0.10,0.10), simulations showed that the Bayesian approach would be a suitable choice for constructing the credible interval for the ratio of means of delta-lognormal distributions based on left-censored data. For (ξ1, ξ2) = (0.10,0.25), the parametric bootstrap approach was a strong alternative for constructing the confidence interval. However, the generalized confidence interval approach can be considered to construct the confidence when the sample sizes are increase. Practical applications demonstrating the use of these techniques on rainfall data showed that the confidence interval based on the generalized confidence interval approach covered the ratio of population means and had the smallest length. The proposed approaches’ effectiveness was illustrated using daily rainfall datasets from the provinces of Chiang Rai and Chiang Mai in Thailand.


INTRODUCTION
Floods occur in Thailand primarily during the monsoon season, which typically lasts from May to October.During this time, heavy rainfall can cause rivers to overflow and inundate or coefficients in a statistical model.Parameters can represent various aspects of a model, such as the coefficients of variables in a regression model, the proportions in a probability distribution, or the odds ratios in logistic regression.In the context of rainfall data analysis, the ratio of parameters is used to compare the effects of different variables on rainfall.For example, the ratio of the coefficients of temperature and humidity in a regression model is used to determine how each of these factors contributes to changes in rainfall.Moreover, the difference of means is likely to be minor when both means are small, and such a minor difference can lead to an inability to draw powerful or definite conclusions.Therefore, the ratio of means is often considered more accurate than the difference of means.The ratio of means is the ratio of two means and is used in many fields.For instance, in bioequivalence, the ratio of means is used to compare the mean of the test drug and the mean of the reference drug.In epidemiology, the ratio of means is used to compare the particulate matter with a diameter of less than 2.5 µm (PM2.5)level averages of two areas.In climate sciences, the ratio of means is used to compare the daily rainfall averages of two areas.Confidence intervals for the ratio of means have been constructed in many research studies, such as those by Chen & Zhou (2006a).
In statistics, the information in a sample is used to make inferences about an unknown parameter.The inference methods are hypothesis testing and estimation (Casella & Berger, 2002).Estimations have a point estimation and an interval estimation.Estimation is of interest in many fields.For example, in the environment, Luo, Shen & Xu (2022) studied the modeling and estimation of system reliability under dynamic operating environments and lifetime ordering constraint.They used the maximum likelihood method for point estimation while proposing generalized inference methods for interval estimation.In industry, Zhang et al. (2022) studied the problem of reliability estimation for a parallel system when one stress variable is involved, referred to as the multicomponent stressstrength model.
The construction of confidence intervals is a crucial aspect of statistical inference, and many researchers have proposed various approaches for constructing such intervals.The generalized confidence interval (GCI) approach uses the concepts of the generalized pivotal quantity (GPQ) to construct the confidence interval.Chen & Zhou (2006a) presented the GCI estimate for the ratio and the difference between the means of log-normal distributions.It gave a highly accurate coverage rate and fairly low bias, especially for small sample sizes.Tian & Wu (2007) proposed the GCI approach for inferences on the common mean of log-normal distributions.Ye, Ma & Wang (2010) proposed inferences on the common mean of several inverse Gaussian populations using the GCI approach.
The bootstrap approach relies heavily on computer simulations.Traditionally, standard errors have been calculated using well-known formulae, often based on assumptions that are not satisfied or only approximately satisfied.In some cases, it may not even be known if the assumptions hold or not.In essence, the bootstrap approach relies on resampling with replacement from the given sample and calculating the required statistic from these repeated samples.The values of the statistic from the repeated sampling can then be used to generate standard errors and confidence intervals for the statistic (Dunn, 2001).Thangjai et al. (2023) constructed the confidence interval for the ratio of the percentiles of two delta-lognormal distributions based on the parametric bootstrap approach.Moreover, Altunkaynak & Gamgam (2019) proposed the bootstrap confidence intervals for the coefficient of quartile variation.
The method of variance estimates recovery (MOVER) approach utilizes the initial confidence interval of a single parameter of interest to construct the final confidence interval.Zou & Donner (2008) constructed the confidence limits about effect measures using the MOVER approach.Zou, Taleban & Hao (2009) proposed the MOVER approach to estimate the confidence interval for log-normal distribution.
Statistics can be divided into two different techniques: the classical approach and the Bayesian approach.The classical approach includes techniques such as the GCI, parametric bootstrap, and MOVER.In this approach, the parameter of interest is unknown but fixed.In contrast, the Bayesian approach considers the parameter of interest as a quantity, and its variation is described by the prior distribution.There are many reasons why a researcher may prefer to use Bayesian estimation over classical estimation.The main reason for choosing the Bayesian approach is that the models are often too complex for traditional methods to handle.It is important to note that, regardless of the reasons for implementing the Bayesian approach, conducting a sensitivity analysis of priors is always crucial and should be included (Depaoli, Winter & Visser, 2020).The impact of the priors is highly dependent on model complexity, and it is crucial to thoroughly examine their influence on the final model estimates.The Bayesian approach can effectively capture the features of posteriors of the parameters of interest by combining information from data and priors.However, obtaining closed forms for each marginal posterior in Bayesian analysis is a challenging task.Markov chain Monte Carlo (MCMC) can be employed to obtain posterior samples from a set of Markov chains with respect to the parameters of interest and nuisance parameters.The iteration is terminated when all chains are stable and well-mixed.MCMC has been well-developed and widely utilized for complex models, allowing parameter estimation through posterior samples generated from a collection of Markov chains (Zhou et al., 2023).Thangjai et al. (2023) proposed the credible interval estimation for the ratio of the percentiles of two delta-lognormal distributions using the Bayesian approach.Furthermore, Thangjai & Niwitpong (2023) constructed the Bayesian credible interval for mean and difference between means of delta-lognormal distributions based on left-censored data.Moreover, the Bayesian approach is also widely used for uncertainty quantification (Zhuang, Xu & Wang, 2023).Aizpurua et al. (2022) studied how uncertainty quantification was incorporated into machine health prognostics through the Bayesian approach.
The rest of this article is organized as follows.'Materials & Methods' describes the four approaches used to construct confidence intervals for the ratio of means of deltalognormal distributions based on left-censored data.'Results' presents the performance of the proposed approaches using simulation studies.'Empirical Application' shows a real data example.'Discussion' provides a discussion.Finally, 'Conclusions' presents concluding remarks.

MATERIALS & METHODS
Suppose Z 1 = Z 11 ,Z 12 ,...,Z 1n 1 is random sample from delta-lognormal distribution with parameters mean µ 1 , variance σ 2 1 , and probability of obtaining a zero observation δ 1 .Similary, let Z 2 = (Z 21 ,Z 22 ,...,Z 2n 2 ) be random sample following the delta-lognormal distribution with parameters mean µ 2 , variance σ 2 2 , and probability of obtaining a zero observation δ 2 .The delta-lognormal distribution is combination of zero and positive values.This distribution contains a binomial distribution and a log-normal distribution.This is because the zero values follow the binomial distribution and the positive values follow the log-normal distribution.The mean of delta-lognormal distribution for Z 1 and Z 2 are given by and (2) Suppose X 1 and X 2 are nonnegative random variables drawn from Z 1 and Z 2 , respectively.In other words, X 1 = (X 11 ,X 12 ,...,X 1n 1 ) and X 2 = (X 21 ,X 22 ,...,X 2n 2 ) follow the log-normal distributions so that Y 1 = log(X 1 ) and Y 2 = log(X 2 ) follow the normal distributions.Suppose that X1 and X2 are the means of X 1 and X 2 , respectively.Moreover, suppose that S 2 X 1 and S 2 X 2 are the variances of X 1 and X 2 , respectively.Let log(ξ 1 ) be censoring point value.Let n 1(1) be the number of observations less than or equal to some censoring point log(ξ 1 ) and let n 1(2) be the number of observations greater than some censoring point log(ξ 1 ).Let Y 1 = Y 11 ,Y 12 ,...,Y 1n 1(2) be the observations above log(ξ 1 ).The mean and variance of Y 1 are given by (3) and Suppose that φ and are the density function and the distribution function of the standard normal distribution.According to Krishnamoorthy, Mallick & Mathew (2011), the maximum likelihood estimators of µ 1 and σ 2 1 are given by μ1 = Ȳ1 − ψ (h 1 ,a 1 ) Ȳ1 − log(ξ 1 ) (5) and where The mean of censored log-normal distribution is The estimator of the mean of censored log-normal distribution is Similarly, let log(ξ 2 ) be censoring point value.Let n 2(1) be the number of observations less than or equal to log(ξ 2 ) and let n 2(2) be the number of observations greater than some censoring point log(ξ 2 ).Let Y 2 = Y 21 ,Y 22 ,...,Y 2n 2(2) be the observations above log(ξ 2 ).The mean and variance of Y 2 are given by and The maximum likelihood estimators of µ 2 and σ 2 2 are given by and where The mean of censored log-normal distribution is The estimator of the mean of censored log-normal distribution is Therefore, the estimator of the ratio of means of censored log-normal distributions is given by where θ1 and θ2 are defined in Eqs. ( 8) and ( 14), respectively.The estimator of the ratio of means of censored log-normal distributions, defined in Eq. ( 15), is used to construct the confidence intervals for the ratio of means of delta-lognormal distributions based on left-censored data.Here, four newly proposed approaches are applied to construct the confidence intervals.Next, the computation of the GCI, Bayesian, parametric bootstrap, and MOVER approaches is explained.

Generalized confidence interval approach
The generalized pivotal quantity (GPQ) is used to construct the GCI which is defined in Weerahandi (1993).According to Krishnamoorthy, Mallick & Mathew (2011), the GPQs for µ 1 , σ 1 , and θ 1 are defined by and where μ * 1 and σ * 1 are the maximum likelihood estimators based on a censored sample from standard normal distribution.
Similarly, the GPQs for µ 2 , σ 2 , and θ 2 are defined by and where μ * 2 and σ * 2 are the maximum likelihood estimators based on a censored sample from standard normal distribution.
The GPQ for the difference between means of delta-lognormal distributions based on left-censored data was used as previously described in Thangjai & Niwitpong (2023).In this article, the GPQ for the ratio of means of delta-lognormal distributions based on left-censored data is given by where R θ 1 and R θ 2 are defined in Eqs. ( 18) and ( 21), respectively.Therefore, the 100(1 − α)% two-sided confidence interval for the ratio of means of delta-lognormal distributions based on left-censored data using the GCI approach is given by where R θ (α/2) and R θ (1 − α/2) denote the 100(α/2)-th and 100(1 − α/2)-th percentiles of R θ , respectively.
The Algorithm 1 is used to construct the GCI for the ratio of means of delta-lognormal distributions based on left-censored data.
Step 4: Compute R θ from Eq. ( 22) Step 5: Repeat step 1 -step 4, a total times and obtain an array of R θ 's Step 6: Compute L GCI and U GCI

Bayesian approach
The Bayesian approach offers a framework for updating beliefs and making predictions using new evidence or data.It is rooted in Bayes' theorem, which combines prior probability and likelihood to calculate the posterior probability.The prior distribution represents uncertainty about parameters before observing data.In this article, we employed the Jeffreys Independence prior.According to Thangjai & Niwitpong (2023), the posterior distributions of σ 2 1 , µ 1 , and θ 1.BS are defined by and where ȳ1 is observed value of Ȳ1 defined in Eq. ( 3) and s 2 1 is the observed value of S 2 1 defined in Eq. ( 4).
The posterior distributions of σ 2 2 , µ 2 , and θ 2.BS are defined by and where ȳ2 is observed value of Ȳ2 defined in Eq. ( 9) and s 2 2 is the observed value of S 2 2 defined in Eq. ( 10).Thangjai & Niwitpong (2023) proposed the posterior distribution of the difference between means of delta-lognormal distributions based on left-censored data.Therefore, the posterior distribution of the ratio of means of delta-lognormal distributions based on left-censored data is given by where θ 1.BS and θ 2.BS are defined in Eqs. ( 26) and ( 29), respectively.Therefore, the 100(1 − α)% two-sided credible interval for the ratio of means of delta-lognormal distributions based on left-censored data using the Bayesian approach is given by where L θ .BS and U θ .BS denote the lower and upper limits of the shortest 100(1 − α)% highest posterior density interval of θ BS , respectively.The Algorithm 2 is used to construct the Bayesian credible interval for the ratio of means of delta-lognormal distributions based on left-censored data.

Parametric bootstrap approach
..,Y * 1n 1(2) be the observations above log(ξ 1 ).The mean and variance of Y * 1 are given by and The estimator of the mean of delta-lognormal distribution based on left-censored data is where Ȳ * 1 and S 2 * 1 are defined in Eqs. ( 32) and ( 33 and The estimator of the mean of delta-lognormal distribution based on left-censored data is where Ȳ * 2 and S 2 * 2 are defined in Eqs. ( 35) and ( 36), respectively.The estimator of the difference between means of delta-lognormal distributions based on left-censored data was proposed as previously described in Thangjai & Niwitpong (2023).According to Thangjai & Niwitpong (2023), the estimator of the ratio of means of delta-lognormal distributions based on left-censored data is given by where θ * 1 and θ * 2 are defined in Eqs. ( 34) and (37), respectively.The lower and upper limits of the confidence interval for the ratio of means of deltalognormal distributions based on left-censored data are given by and where θ * is the mean of θ * , sd( θ * ) is the standard deviation of θ * , and z 1−α/2 is the 100(1 − α/2)-th percentile of the standard normal distribution.Therefore, the 100(1 − α)% two-sided confidence interval for the ratio of means of delta-lognormal distributions based on left-censored data using the parametric bootstrap approach is given by where L PB and U PB are defined in Eqs. ( 39) and ( 40), respectively.The Algorithm 3 is used to construct the parametric bootstrap confidence interval for the ratio of means of delta-lognormal distributions based on left-censored data.

RESULTS
In this section, we conducted simulation studies to evaluate the performance of the proposed confidence interval, which was constructed using four different approaches.We calculated the coverage probability and average length using R software.The criteria for choosing the best performing confidence interval were a coverage probability greater than or equal to 0.95 and the shortest average length for each tested scenario.For each generated data set, we used R code to compute the confidence intervals based on the GCI, Bayesian, parametric bootstrap, and MOVER approaches (Thangjai & Niwitpong, 2023), with M = 5,000 runs for each and m = 2,500 runs for the GCI, Bayesian, and parametric bootstrap approaches.Following Owen & De Rouen (1980), the standardized sensitivity for many air contaminants is typically around 0.25.Although 0.25 was determined to be the most appropriate value for ξ when examining the censoring techniques, additional runs were performed using ξ = 0.10, allowing for the examination of results for the delta distribution with different values of µ and σ .We generated random sample sizes of (n 1 ,n 2 ) = (20,20), (30,30), (20,30), (50,50), (30,50), (100,100), and (50,100) with specific parameters, as described in Table 1.The Algorithm 4 is used to construct the confidence intervals for the ratio of means of delta-lognormal distributions based on left-censored data, and then the coverage probability and average length of the confidence intervals are computed.
Step 1: Generate z 1 from delta-lognormal distribution with parameters µ 1 , σ 1 , and δ 1 and set x 1 from log-normal distribution with parameters µ 1 and σ 1 Step 2: Generate z 2 from delta-lognormal distribution with parameters µ 2 , σ 2 , and δ 2 and set x 2 from log-normal distribution with parameters µ 2 and σ 2 Step 3: Compute y 1 = log(x 1 ) and select y 1 > log(ξ 1 ) Step 4: Compute y 2 = log(x 2 ) and select y 2 > log(ξ 2 ) Step 5: Compute n 1(1) , n 1(2) , n 2(1) , n 2(2) , μ1 , μ2 , σ1 , σ2 , θ1 , θ2 , and θ Step 6: Construct the confidence intervals CI GCI , CI BS , CI PB , and CI MOVER Step 7: If L θ U , set p = 1; else set p = 0 Step 8: Compute U − L Step 9: Repeat step 1 -step 8, a total M times Step 10: Compute mean of p defined by the coverage probability Step 11: Compute mean of U − L defined by the average length The coverage probability and average length of the confidence intervals for the ratio of means of delta-lognormal distributions based on left-censored data are presented in Table 2 and shown in Figs.1-3.For run 0, the Bayesian approach outperforms the others in terms of both coverage probability and average length for all sample sizes.Overall, the coverage probabilities are less than or equal to the nominal confidence level of 0.95.Therefore, we used the confidence intervals for the ratio of means of delta-lognormal distributions based on left-censored data to estimate the ratio of means for datasets containing zero, positive, and censored observations.For runs 1-8, the results show that the coverage probabilities of the confidence intervals based on the GCI, Bayesian, parametric bootstrap and MOVER approaches were almost greater than the nominal confidence level of 0.95.The average lengths of the confidence interval based on the Bayesian approach were the shortest for (ξ 1 ,ξ 2 ) = (0.10,0.10), while the average lengths of the confidence interval based on the parametric bootstrap approach were shorter than those of the others for

Notes.
Bold font means the confidence interval with coverage probability greater than or equal to 0.95 and the shortest average length.(ξ 1 ,ξ 2 ) = (0.10,0.25).The results indicate that the Bayesian approach is recommended for constructing the credible interval for the ratio of means of delta-lognormal distributions based on left-censored data for (ξ 1 ,ξ 2 ) = (0.10,0.10).However, the parametric bootstrap approach can be used to estimate the confidence interval for the ratio of means of deltalognormal distributions based on left-censored data for (ξ 1 ,ξ 2 ) = (0.10,0.25).Moreover, the GCI approach can be used to construct the confidence interval for the ratio of means of delta-lognormal distributions based on left-censored data for run 3 and run 5 when the sample sizes are increase.

EMPIRICAL APPLICATION
The GCI, Bayesian, parametric bootstrap, and MOVER approaches discussed in 'Materials & Methods' can be applied to estimate the ratio of average daily rainfall datasets from Chiang Rai and Chiang Mai provinces in Thailand.Table 3 shows the daily rainfall data     with an interval length of 40960.7200.The lower and upper limits of the 95% confidence interval correspond to the 2.50-th and 97.50-th percentiles of the average rainfall average ratio between Chiang Rai and Chiang Mai provinces.Therefore, the GCI approach has the shortest interval length.Therefore, the GCI approach is recommended for constructing the confidence intervals for the ratio of means of delta-lognormal distributions based on left-censored data.Moreover, confidence intervals for the ratio of means of deltalognormal distributions, based on left-censored data, can be applied to environmental, meteorological, and climatological data, which often consist of positive values or exhibit right-skewed distributions, such as PM2.5 and PM10.

DISCUSSION
Ratio of parameters focuses on the relative strength or proportion of effects, while difference of parameters emphasizes the absolute difference in the effects of two variables.Both concepts are valuable in statistical analysis, and their application in rainfall data analysis depends on the specific research question and the variables being examined.In bioassays, the ratio quantities are of potential interest.Calculating relative potency necessitates estimating the ratio of normal means.This is due to the fact that the ratio of means represents the expected values of the least squares estimates in a simple linear regression.Moreover, the problem of estimating the unoriented direction of a mean vector can lead to ratio estimation, as the direction is fully specified by the collection of all ratios of the component means (James, 1982).Therefore, the ratio of means is important.Several researchers have studied interval estimation for the ratio of means.For example, in environmental science, Zhang et al. (2021) constructed simultaneous confidence intervals

Figure 1 Figure 2
Figure 1 Comparison of the coverage probabilities and average lengths of the confidence intervals for the ratio of means of delta-lognormal distributions based on left-censored data according to sample sizes.(A) Coverage probabilities.(B) Average lengths.Full-size DOI: 10.7717/peerj.16397/fig-1

Figure 3
Figure 3 Comparison of the coverage probabilities and average lengths of the confidence intervals for the ratio of means of delta-lognormal distributions based on left-censored data according to standard deviations.(A) Coverage probabilities.(B) Average lengths.Full-size DOI: 10.7717/peerj.16397/fig-3

Figure 6
Figure 6 Normal QQ-plots of the log-transformed the daily rainfall data in Chiang Rai and Chiang Mai provinces.(A) Chiang Rai Province.(B) Chiang Mai Province.Full-size DOI: 10.7717/peerj.16397/fig-6

Table 2
from June 1st to June 30th, 2022, presented by the Thai Meteorological Department.The table includes 30 observations, out of which 13 of 30 (43.33%) represent positive observed values in Chiang Rai province, and nine of 30 (30.00%) represent positive observed values in Chiang Mai province.Table4shows the possible distributions for the positive rainfall data applied to the minimum Akaike information criterion (AIC).Figure4presents the densities of the daily rainfall data in Chiang Rai and Chiang Mai provinces.Figure5presents the histograms of the daily rainfall data in Chiang Rai and Chiang Mai provinces.Figure6presents the normal QQ-plots of the log-transformed the daily rainfall data in Chiang Rai and Chiang Mai provinces.The log-transformed positive daily rainfall values of Chiang Rai and Chiang Mai provinces follow normal distributions.Therefore, the daily rainfall datasets in Chiang Rai and Chiang Mai provinces fit the delta-lognormal distributions.

Table 4 The estimated AIC values for the four probability models, calculated using rainfall data in Chiang Rai and Chiang Mai provinces. Distribution Chiang Rai province Chiang Mai province
Bold font means the distribution with the lowest AIC value. Notes.